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Figure 1: Our shape retrieval pipeline. The first row shows the pipeline for processing the input sketch. The input sketch is pre-segmented 
into semantic parts (color-coded) (b), which are assigned to different image regions and then grouped into a pyramid (c). Gabor features 
are then extracted from each group of parts (d), which are concatenated into the Pyramid-of-Parts (e). The last two rows illustrate 
the processing of database models. A pre-segmented 3D model (a) is first rendered into a 2D model contour using Suggestive Contours (b), 
where segmentation is transferred from the model. After that, the process is similar to processing of the input sketch. Finally, the input sketch 
is matched with each contour (or view) of every model, and a ranked list of similar models is returned (f). 


Abstract 

We present a multi-scale approach to sketch-based shape retrieval. 
It is based on a novel multi-scale shape descriptor called Pyramid- 
of-Parts, which encodes the features and spatial relationship of the 
semantic parts of query sketches. The same descriptor can also be 
used to represent 2D projected views of 3D shapes, allowing ef¬ 
fective matching of query sketches with 3D shapes across multiple 
scales. Experimental results show that the proposed method outper¬ 
forms the state-of-the-art method, whether the sketch segmentation 
information is obtained manually or automatically by considering 
each stroke as a semantic part. 
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1 Introduction 

Due to their simplicity and intuitiveness, sketch-based interfaces 
have been popular for 3D shape retrieval. A standard approach is 
to turn the 2D-3D matching problem involved into a 2D-2D match¬ 
ing problem by first rendering every 3D repository model as 2D 
contours under multiple views and then matching the query sketch 
with every resulting contour. How to define effective features to 
represent both input sketches and 2D contours is a key challenge 
in sketch-based shape retrieval. We use 2D contours of models and 
model sketches interchangeably in the following discussion. 

Various feature presentations (e.g.. Gist [Oliva and Torralba 2001], 
Spherical Harmonics [Funkhouser et al. 2003], Eccentricity [Li and 
Johan 2012]) have been proposed to represent both queries and 
model sketches globally. Such global sketch representations are 
able to encode high-level shape information, but sensitive to intra¬ 
class variation and shape deformation. Recently, BoW representa¬ 
tion [Eitz et al. 2012] has been brought into the field to address this 
problem. This representation is based on the statistics of the local 
features, such as GALIF [Eitz et al. 2012] and SIFT [Li et al. 2013], 
and it is proven more robust against the variations in the query and 
model sketches. However, approaches based on this representation 
may easily return locally similar but globally very different models 
(Figure 2). This is because the local features are still defined at the 
pixel level, without leveraging any high-level semantics, such as the 
semantically meaningful parts in the sketch and the 3D models. 

Part-level representations have been proven useful for object de¬ 
tection and recognition [Felzenszwalb and Huttenlocher 2005] in 

































































the computer vision community. However, existing sketch rep¬ 
resentations are still largely defined at the pixel level only. On 
the other hand, many techniques have been proposed to consis¬ 
tently decompose a set of 3D models into semantically meaningful 
parts [Kalogerakis et al. 2010; Huang et al. 2011; Hu et al. 2012]. 
In recent years, several techniques have also been developed to se¬ 
mantically segment freehand sketches, either automatically or in¬ 
teractively [Noris et al. 2012; Sun et al. 2012; Huang et al. 2014]. 
Hence, it is interesting to explore if the use of semantic parts could 
lead to more discriminative sketch representations for retrieval. 

In this paper, we present a new sketch representation, called 
Pyramid-of-Parts, for sketch-based shape retrieval. Our representa¬ 
tion is derived from the available part-level information associated 
with the query sketch and the 3D repository models. We consider 
two ways of obtaining sketch segmentation information, manually 
specified and automatically obtained simply by assuming that each 
input stroke forms a semantic part. As the semantic segmentation of 
the query sketch and that of the 3D models might be different or at 
different levels of detail due to the multi-scale nature of objects, we 
thus adapt the idea of image pyramids to encode semantic parts in a 
multi-scale manner. Our retrieval algorithm will then compare the 
Pyramid-of-Parts of the input sketch with those of the 3D models 
across different scales and return the list of models ranked accord¬ 
ing to how well they match with the input sketch semantically. 



Figure 2: Compared with [Eitz et al. 2012], our sketch-based re¬ 
trieval method based on the Pyramid-of-Parts returns more relevant 
models. The segmentation in the query sketch is color-coded. 


Thanks to the Pyramid-of-Parts, our sketch-based shape retrieval 
technique outperforms the state-of-the-art techniques, which are 
based on either local descriptor [Eitz et al. 2012] or global descrip¬ 
tor [Zou et al. 2014], on two sketch datasets. We present a new 
sketching interface that supports the commonly used coarse-to-fine 
drawing practice and naturally provides semantically segmented 
sketches as query sketches. Since our representation encodes spa¬ 
tial information of the query sketch, our technique often produces 
desired results even if only a subset of parts are depicted in a query 
sketch (Figure 3) and performs better than [Eitz et al. 2012] in this 
task. We refer to this kind of matching based on incomplete input 
sketches as incomplete matching in this paper. 

Figure 3: Given some incomplete query sketches (top row), our 
method is able to return the desired models (bottom row). 

2 Related Work 

Sketch-based Shape Retrieval. This problem is often tackled 
by finding a repository model which rendered 2D contour (i.e., a 
silhouette rendering of the 3D model) under a certain viewpoint 
best matches the query sketch. Existing solutions mainly differ 


in the feature descriptors used to represent query/model sketches 
and can be largely categorized into two groups. The first group 
of approaches make use of global descriptors (e.g., [Loffier 2000; 
Funkhouser et al. 2003]) to represent the sketch globally. How¬ 
ever, global descriptors are sensitive to intra-class variations and 
shape deformation, which would bring global changes to the de¬ 
scriptors. Global descriptors have difficulty in handling incomplete 
query sketches, as the missing information would also affect the 
global descriptors. 

The second group of approaches use statistics about local descrip¬ 
tors for sketch representation. For example, Yoon et al. [2010] rep¬ 
resent sketches using statistics of their diffusion tensor fields, lead¬ 
ing to a histogram of orientations. Saavedra et al. [2012] represent 
a sketch by the HOG feature of its “key shape”, which is an ap¬ 
proximation of the contour with straight lines. Eitz et al. [2012] 
adopt the Bag-of-Features (BoF) model to represent a sketch as 
a histogram of visual words. These methods strike a balance be¬ 
tween local and global features of 2D shapes, and are able to toler¬ 
ate the inaccuracies inherent in sketches to some extent. However, 
since they discard spatial relationship among local descriptors, they 
often return unrelated shapes as similar (Figure 2). Such spatial 
information of local descriptors can be captured by our proposed 
Pyramid-of-Parts, resulting in more discriminative power. More¬ 
over, there are some other methods (e.g., [Shao et al. 2011; Li and 
Johan 2013]) that directly align the query sketch to rendered model 
views to compute sketch-to-model distances. The main limitation 
of these approaches is that they usually suffer from the efficiency 
problem. 

Sketch-based Image Retrieval. Sketch-based image retrieval has 
been under intensive research since 1990s, and many shape descrip¬ 
tors have been explored. (See [Eitz et al. 2010] for a nice survey.) 
Among them. Histograms of Oriented Gradients (HOG) [Dalai and 
Triggs 2005] and Shape Context [Belongie et al. 2002] have be¬ 
come very popular due to their simplicity, generality and discrim¬ 
inative power. The BoF model can be built upon these local de¬ 
scriptors [Eitz et al. 2011], and the resulting feature is shown to 
be more tolerant to sketch variations. These shape descriptors can 
be extended to multi-scale, leading to, for example, multi-scale 
HOG [Hu et al. 2010] and multi-sample BoF [Wu et al. 2009], 
where the sampled image patches do not have a fixed size. Com¬ 
pared to these works, our multi-scale descriptor is defined over se¬ 
mantic parts in the sketch, rather than image patches which content 
might bear no semantic information. 

Part-based Models. In recent years, part-based models have been 
widely used in the computer vision community for the detection or 
recognition of objects in images. For example, Felzenszwalb and 
Huttenlocher [2005] present a pictorial structure model to encode 
the relationship among different body parts. More recently, Felzen¬ 
szwalb et al. [2010] introduce the Deformable Part Model (DPM), 
which is able to successfully identify complex objects. Ferrari et 
al. [2008] proposes a kAS feature for object detection, with each 
“part” constructed by linking k roughly-straight adjacent contour 
segments. However, it is unclear how to apply these models de¬ 
signed for object detection to our shape retrieval problem. 

Image Pyramids. Since the pioneering works in [Burt 1981; Burt 
and Adelson 1983], pyramid methods have been extensively used 
for image analysis to capture the underlying patterns in multiple 
scales. For example, Lazebnik et al. [2006] introduce the idea of 
spatial pyramid matching (SPM) for natural scene categorization. 
SPM uses features extracted in regions of different sizes, and orga¬ 
nizes them into a spatial pyramid. This idea has been later extended 
and used in many applications, such as image classifications [Yang 
et al. 2009], image matching [Shrivastava et al. 2011] and 3D object 
recognition [Li and Guskov 2007; Redondo-Cabrera et al. 2012]. 
We also adopt this idea of spatial pyramid, since it is proven to 






be more effective than single-level approaches. However, unlike 
existing methods, which construct pyramids of pixels, our method 
constructs a pyramid of semantic parts. 

3 Pyramid-of-Parts 

We assume that both the input query sketch and the 2D model con¬ 
tours have been pre-segmented into semantic parts. Due to the 
multi-scale nature of objects, it is not uncommon that a query sketch 
and a model contour correspond to the same object but have differ¬ 
ent segmentations. It is thus important to know the scale of each 
part and compare parts only at the same scales. 

Since each part does not have its corresponding label, it is chal¬ 
lenging to form a semantically meaningful hierarchy of parts for 
matching. Instead, we adapt the idea of image pyramids into our 
problem for part scale normalization. Note that the query sketch 
and the model contours are both represented by pyramids of parts. 
The use of a common pyramid for both of them not only makes it 
possible to compare parts at the same scales but also capture the 
multi-scale nature of objects. Although the following discussion 
focuses mainly on how we process input sketchs, model contours 
are processing in exactly the same way. 

Definition. Like image pyramids, a Pyramid-of-Parts consists of 
multiple scale levels, with each level containing groups of pre¬ 
segmented parts in the input sketch. Each group of parts as a whole 
at the same level have similar scale, and upper levels have larger 
scales, as illustrated in Figure 4. 

3.1 Pyramid Generation 

To generate a Pyramid-of-Parts, we associate each level with a set 
of regions in the sketch image. Let R\ denote a region at level 1. For 
example, R\, ..RI are the nine regions at level 1, as illustrated 
in Figure 4. Each region corresponds to a group of parts. Note that 
a region might be associated with zero (e.g., RI), one (e.g., Ri) 
or multiple parts (e.g., RI). Parts associated with a region R\ is 
considered as one group of parts at level 1. Since groups of parts at 
upper levels are of larger scale, we require larger regions at upper 
levels. 

The main criteria to determine if a part a belongs to a region R is 
to check whether a is enclosed by R or not. When a is completely 
within R (e.g., the leg part in RI in Figure 4), it is easy to conclude 
that a should be in the group of parts associated with R. However, 
the problem becomes tricky when a covers multiple regions (e.g., 
the red arm in Figure 4 covering R\ and Rl). Some part may cover 
multiple regions at the current level mainly because it should belong 
to a region at an upper level. For example, the body part in Figure 4 
belongs to region Rl, instead of Rl, Rl or Rl. 

With the above observations we determine if a is assigned to R by 
considering both the inclusion of a inside R and the relative size of 
a to R. Specifically, we formulate the likelihood of a belonging to 
R as p{a): 

p{a) = Ws{a)Wi{a), (1) 

where Ws{a) enforces penalty when the size of a, denoted as S 
and defined as the longer side of the bounding box of a, is larger 
than I3L. Here /3 is a parameter {(3 — 0.85 in our implementation) 
and L is the length of the longer size of R. Precisely, when S < 
I3L, we set Ws{a) = 1, corresponding to no penalty. Otherwise, 
Ws{a) decreases as S deviates from /3L, which is computed as 

-(S-pLp 

Ws{a) = l.le — 0.1, and Ws{a) is clamped to 0 if it 

becomes negative. 

Even if the size of a is smaller than the size of R, it is still possible 
that a extrudes from R, since a is not necessarily centered at R. 
We thus use Wi to penalize the extrusion of a from R. Let |a| be 


the stroke length of a, a n be the part of a inside R, and we 
have Wi{a) = \ar\ R\/\a\, which reaches the maximum when a is 
completely inside R. 

In the end, a is assigned to R if p{a) > 0.5. Also, we compute 
a reliability value for each region to quantify the certainty of the 
assignments happened to this region, which in later stages is used 
to downplay those regions having many uncertain assignments of 
parts. Let {ai, a 2 ,an} be the parts assigned to region R, then 
the reliability of R is computed as c{R) = ^ Wip{ai), where Wi = 
|ai| / ^ |ai|. Each part will eventually get assigned to at least one 
region, because the topmost level region (Rl in Figure 4) covers the 
entire image. 

3.2 Feature Extraction 

The Pyramid-of-Parts feature is the concatenation of all the features 
extracted from all the regions. To begin with, each of the regions in 
the pyramid is either empty or contains a group of parts. For empty 
regions, their features are simply all zeros. For others, their fea¬ 
tures are gabor features extracted from the groups of parts in them, 
as shown in Figure 5. A group is first placed in a bounding square, 
and then convolved with a set of gabor filters. Each response is av¬ 
eraged by a grid (Figures 5(c) and 5(d)), and the outcome becomes 
part of the final feature (Figure 5(e)). The parameters of the gabor 
filters are different for each level of the pyramid, and is discussed 
in Section 5.1. 


Gabor Filter Bank Response 


Averaged Feature 



Figure 5: Extracting the feature for a group of parts with gabor 
filters. The group is first placed in a bounding square (a), and then 
convolved with a set of gabor filters (b), each resulting in a response 
map (c), which are further averaged by a coarse grid (d) and con¬ 
catenated into the final feature (e). 


4 Shape Retrieval Framework 

Our shape retrieval engine is built upon the Pyramid-of-Parts fea¬ 
ture, and the pipeline is shown in Figure 1. As in [Eitz et al. 2012], 
we take a 2D-to-2D matching approach, i.e., matching the input 
sketch against all the views of all the models in database. To con¬ 
struct the database, we render each model for each selected view 
using suggestive contour [DeCarlo et al. 2003] (Figure 1(b)), and 
extract its Pyramid-of-Parts feature (Figures 1(d) and 1(e)). Given 
a query sketch, its Pyramid-of-Parts feature will be extracted and 
matched against all the features in the database, after which the top 
matched models will be retrieved. 

4.1 The Query Sketch 

The input query sketch consists of a set of strokes drawn by the user. 
The sketch is scaled such that a fixed-sized canvas (of resolution 
320 X 320 in our implementation) forms its bounding square. To 
extract the Pyramid-of-Parts feature, segmentation of the sketch is 
required, which can be done automatically [Sun et al. 2012; Huang 
et al. 2014] or manually [Noris et al. 2012]. For maximum accuracy. 
















r! 

R2 

R3 rI 

Rs rI Ry rI Rg 


o 


c 



c 


0 


Ri 


) r1 

r1 ^ 

^Rl ' 

^ r; 





(a) Level 1 


R2 R3 R4 



(b) Level 2 



(c) Level 3 


Figure 4: Constructing a 3-level Pyramid-of-Parts by assigning the semantic parts (color-coded) to different regions of the pyramid. is 
the i-th region in j-th level 


here we opt for the manual approach, which is discussed in detail 
in Section 5.1. 

4.2 Database Construction 

To construct a 3D database, we need a set of segmented 3D mod¬ 
els. This database stores the Pyramid-of-Parts features of the 2D 
contours of each segmented models under a selected set of views. 

The procedure of computing the Pyramid-of-Parts feature of a 3D 
model is shown in the second and third rows of Figure 1. Given 
a segmented 3D model, a 2D model contour is generated from 
a given view of the model using Suggestive Contours [DeCarlo 
et al. 2003], with the segmentation information transferred from 
the 3D model (Figure 1(b)). The semantic parts are then processed 
into a Pyramid-of-Parts (Figure 1(c)), which is used to produce the 
Pyramid-of-Parts feature (Figure 1(d)) following the procedures de¬ 
scribed in Section 3. Multiple views are used for each model, which 
are representative views generated using [Zou et al. 2014]. The 
view generation process starts by sampling many views uniformly 
distributed on the viewpoint sphere, among which 42 views cover¬ 
ing most of the information given by the dense views are selected. 

4.3 Retrieval 

To retrieve a model, the Pyramid-of-Parts feature of the sketch is 
compared with all the Pyramid-of-Parts features in the database, 
and the K nearest neighbors are returned as matches. The distance 
between two Pyramid-of-Parts features is the weighted sum of dis¬ 
tances between the constituent gabor features in the corresponding 
regions. Let x = (xi, X 2 ,Xn) be a Pyramid-of-Parts feature, 
where xi is the gabor feature of region Ri. (The level is not impor¬ 
tant here.) The distance of two features x and y is computed as: 

n 

■D(x,y) = - i/i|| (2) 

i=l 

The weight Wi is proportional to the product of region importance 
and reliability. For region Ri, the importance rrii is set roughly pro¬ 
portional to the area of the region. In our 3-level implementation, 
and rrii is set to 1, 4 and 9 for regions at levels 1, 2 and 3, respec¬ 
tively. The reliability c(Ri) is the degree of certainty of assigning 
the semantic parts in Ri to Ri, as described in Section 3.1. Given 
these quantities, the unnormalized weight w'i = mic{Ri), and the 
final weight Wi = w'i j 


5 Evaluation 

We have conducted four experiments, Exp. 1-4, to evaluate the 
performance of the proposed method. Exp. 1 evaluates the per¬ 
formance when using different region subdivision strategies (Sec¬ 
tion 5.2). Exp. 2 evaluates the performance when using user- 
provided sketch segmentation information (Section 5.3). Exp. 3 
evaluates the performance of a simple, automatic sketch segmen¬ 
tation strategy by grouping strokes (Section 5.5). Einally, Exp. 4 
evaluates the performance on incomplete matching (Section 5.6). 

5.1 Experimental Settings 

Tools. We have developed a prototype retrieval system, which we 
used to collect input sketches and evaluate the performance of the 
proposed method. The interface of the system allows users to draw 
three types of strokes: bounding box, segmentation and the query 
sketch. The user may draw these strokes in any order. The bounding 
box strokes represent a bounding box of the sketch, which is only 
useful for incomplete matching (Seciton 5.6), where a bounded can¬ 
vas is needed. The segmentation strokes are used to segment the 
query sketch into semantic parts. Each segmentation stroke is a 
closed curve forming a zone, and each stroke of a query sketch is 
assigned to the zone that contains more than half of it. All the 
query sketch strokes assigned to one zone are assumed to form one 
semantic part. Note that it is possible for one semantic part to lie 
completely within another (e.g., a human eye and head), and they 
cannot be separated because the zone for the larger semantic part 
will contain that of the smaller one. In this case, a query sketch 
stroke will be assigned to the smaller zone only if more than half 
of the stroke is inside it. Einally, if there exists N semantic parts, 
only — 1 zones are needed, and the strokes not belonging to any 
zone are assigned to a background zone, which also represents one 
semantic part. 

3D models dataset. Our 3D models come from the PSB 
dataset [Chen et al. 2009]. This dataset contains 380 models in 19 
categories. It contains segmentation results from different segmen¬ 
tation methods and we selected the segmentation results produced 
by Randomized Cut [Chen et al. 2009]. 

Sketch dataset. With our prototype retrieval system, we collected a 
total of 428 complete sketches and 205 partial sketches (see the sup¬ 
plemental). Both full and partial sketches covered all 19 categories 
of the PSB model dataset. 10 users were invited to freely draw 
query sketches after we had shown them an example model from 
each category of the 3D dataset. Users were asked to freely spec- 


















































































ify the segmentation strokes for their drawings. This sketch dataset 
is used in most of the experiments where segmentation is needed. 
To compare with [Eitz et al. 2012] fairly when sketch segmenta¬ 
tion is not available, we use a subset of their sketch data, which 
includes 395 sketches, covering 10 of the PSB model categories. 
Other sketches used by [Eitz et al. 2012] do not have a correspond¬ 
ing category in the PSB model dataset and thus are discarded. 

Performance metrics. To qualitatively evaluate the proposed 
method, we have adopted four performance metrics: 1) Precision- 
recall; 2) Top One (TO), which measures the precision of the top- 
one results, averaged over all queries; 3) Eirst Tier (ET), which 
measures the precision of the top N results (where N is the num¬ 
ber of ground-truth models relevant to the query), averaged over all 
queries; and 4) Mean Average Precision (mAP), which summarizes 
the average precision of ranking lists for all queries. 

Methods for Comparison. We mainly compared our framework 
with the popular Bag-of-Words framework (denoted as BOW) [Eitz 
et al. 2012], and the global feature based framework (denoted as 
GF) as used in [Eitz et al. 2011], which encodes the whole query 
sketch using a chosen shape descriptor (GALIE in our case). Our 
full method is denoted as OUR-FULL. As all the methods use 
Gabor filter somewhere, the parameters of the Gabor filters for all 
of them in all the experiments are set to the same (as described 
below). 

Parameters of the Gabor filters. Gabor filters are used in all the 
methods compared, and the following parameters are shared among 
them: peak response frequency cjq = 0.1, frequency bandwidth 
ax = 3, angular bandwidth ay = 9, and the orientations 0 are 
0,7r/6,7r/3,7r/2,27r/3,57r/6. The explanation of these parame¬ 
ters can be found in [Eitz et al. 2012]. When averaging the Ga¬ 
bor response (Eigure 5(d)), the grid size needs specified. Eor our 
method, it is 2x2, 4x4 and 6x6 for the image regions in Levels 1, 
2 and 3, respectively. Eor GF, it is 6x6, same as the grid size used 
for the top-level image region in our method. For BOW, it is 4x4 
as in [Eitz et al. 2012]. 


• Using a different number of levels: First, we add one more 
level between the current Levels 2 and 3 with four 3L x 2L or 
2L X 3L regions to 4R_NO. This is denoted as 4LV. Second, 
we remove one level (Level 2) from 4R_NO. This is denoted 

as 2LV. 

In this experiment, the region subdivision for Levels 1 and 3 are 
fixed. Figure 8 compares the retrieval performances of the above 
four schemes. It shows that introducing region overlapping, adding 
more regions and adding more levels all help improve the perfor¬ 
mance. This is because these operations increase the amount of 
information in the resulting feature, improving its discriminative 
power. Since the best performance is obtained using the scheme 
6R_0, we use it in later experiments. 



Figure 7: Different region subdivision schemes: (a) four regions 
without overlapping in Level 2 (i.e., 4R2NO); (b)four regions with 
overlapping in Level 2 (i.e., 4 RjO); (c1)(c2) two new regions added 
to Level 2 (i.e., 6RjO); (d) regions of the new level added between 
Levels 2 and 3 of (i.e., 4LV). 



Figure 6: Three types of input strokes in our system: bounding box 
(left, gray), sketch (black) and segmentation strokes (right, gray). 


5.2 Exp. 1: Region Subdivision Strategy 

Our method is based on Pyramid-of-Parts as illustrated in Fig.l. 
The default number of levels for the Pyramid-of-Parts is 3 and the 
subdivision of regions is as shown in Figure 4. In this experiment, 
we study the effect of using different subdivision schemes on our 
retrieval performance: 

• Without overlapping regions: We divide the sketch into four 
1.5L X 1.5L regions, where L is the side length of the square 
region R\, as shown in Figure 7(a). This is denoted as 

4R_NO. 

• Using different ways of constructing Level 2 regions: First, 
we divide the sketch into four different overlapped 2L x 2L 
regions, as shown in Figure 7(b). This is denoted as 4R_0. 
Second, in addition to the original four regions shown in Fig¬ 
ure 4(b), we add two new 2L x 2L regions to Level 2 as shown 
in Figures 7(cl) and 7(c2). This is denoted as 6R_0. 
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Figure 8: Performance comparison of different region subdivision 
schemes. 


5.3 Exp. 2: Full Method Comparison 

In this experiment, we compare our full method (OUR-FULL) to 
the two competing methods, namely Bag-of-Words model (BOW) 
and retrieval by global feature (GF), over the 428 segmented 
sketches collected using our system. The results are shown in Fig¬ 
ure 9 as red, purple and black curves, respectively. 

We can see that our method (OUR-FULL) has achieved the best 
retrieval performance on all four evaluation metrics. BOW has 
achieved the second best average retrieval precision (mAP), but its 
retrieval accuracies evaluated by TO and FT are worse than those of 
GF. These results motivated us to investigate if it is the multi-scale 
nature of the Pyramid-of-Parts or the use of semantic parts that fa- 
ciliates the better performance of OUR-FULL over BOW and GF. 
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Figure 9: Retrieval performance on the 428 segmented sketches 
collected using our system. 
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Figure 10: Retrieval performance on the 96 sketches having con¬ 
sistent segmentation information as the 3D models in the dataset. 


5.4 Multi-scale vs. Semantic Parts 

Since our method adopts two main ideas, the multi-scale nature of 
the Pyramid-of-Parts or the use of semantic parts, we would like to 
understand how two ideas contribute to the overall retrieval perfor¬ 
mance. Hence, we tested two approaches to evaluate the two ideas 
individually. 

The first approach (denoted as OUR-NOG) skips the grouping 
stage, i.e., removing the effect of multi-scale. However, as the 
sketches may contain different number of semantic parts, if we sim¬ 
ply extract a Gabor feature for each part, the final feature vectors 
will be of different lengths for different sketches, making them hard 
to compare. As such, to obtain a fixed-length feature vector, we still 
assign the semantic parts to image regions in one of the levels, but 
each semantic part is only assigned to one region (the one having 
the highest assignment likelihood in Eq. 1). After that, the process 
is the same as the full method. 

The second approach (denoted as OUR-PIX) removes all the infor¬ 
mation about semantic parts but keeps the multi-scale process. To 
do that, the sketch is rasterized into an image, and all segmentation 
information is discarded. In the grouping stage, the image patch 
bounded by each region is regarded as a part, which Gabor feature 
is extracted to compose the final feature. 

The average retrieval performances of the two approaches on all 
the sketches are shown in Figure 9. We can see that OUR-PIX per¬ 
forms only slightly better than GF, and OUR-NOG performs much 
worst than GF and OUR-PIX, while OUR-FULL performs the 
best. This experiment shows that the combination of multi-scale 
and usage of semantic parts significantly improves the retrieval im¬ 
provement than only using one of ideas. From our analyses of the 
results, we have also found that OUR-NOG often performs better 
on sketches that are segmented into a small number of parts by the 
user, such as the lower diagram shown in Figure 2(a). The main 
reason is that with a small number of parts, the segmentation of 
the 3D models tends to correspond to the segmentation of the input 
sketches. 

To further evaluate this point, we selected all the sketches which 
segmentation information provided by the users are largely consis¬ 
tent with that of the 3D models in the dataset. There are 96 such 
sketches in total. Most of these sketches fall into three categories, 
”Cup”, ’’Glass”, and ’’Teddy Bear”, where the segmentation is less 
ambiguous. The retrieval results of these query sketches, as shown 
in Figure 10, indicate that OUR-FULL significantly outperforms 
the other methods, when the segmentation information of the input 
sketches is consistent with those of the 3D models. 

5.5 Exp. 3: Strokes as Semantic Parts 

In this experiment, we investigate how our method performs when 
segmentation information of the input query sketches is not avail¬ 


able. A straightfoward approach to cope with this problem is to 
consider each stroke as a semantic part. This approach is denoted 
as OUR-STK and is evaluated here over the sketch dataset provided 
by [Eitz et al. 2012]. The results are shown in Figure 11. 

It is interesting to see that OUR-STK performs better than 
BOW and GF. The reason is that users’ strokes tend to approxi¬ 
mate the true segmentation to some extent. These results also in¬ 
dicate that OUR-FULL can achieve higher performance even on 
sketches without user segmentation. 
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Figure 11: Retrieval performance using stroke-based segmentation 
on sketches provided by [Eitz et al. 2012]. 


5.6 Exp. 4: Incomplete Matching 

As the Pyramid-of-Parts feature is a collection of Gabor features 
obtained from different image regions, it is possible to compare the 
Pyramid-of-Parts features of some local regions only. This char¬ 
acteristic of the Pyramid-of-Parts feature suggests an interesting 
application, incomplete matching, where a partially drawn query 
sketch can be used for model retrieval. Note that incomplete match¬ 
ing is not exactly the same as partial matching. With partial match¬ 
ing, the input sketch can be matched any local region of a database 
model. With incomplete matching, we may make use the location 
information of the drawn strokes relative to the canvas (or the user 
provided bounding box) so that the matching can be localized. 

Here, it may be interesting to compare our method with [Eitz et al. 
2012]. Although [Eitz et al. 2012] can also be used for incomplete 
matching, as their method computes some global statistics of local 
features, the comparison itself is therefore global, i.e., comparing 
the global statistics of an incomplete input sketch with those of the 
2D model contour of a database model. On the contrary, with our 
method, we may simply skip the comparison of the Gabor features 
of those regions with no semantic parts in them. 

To evaluate the performance of our method for incomplete match¬ 
ing, we have performed an experiment on incomplete matching. 
We collected 205 partial sketches covering all 19 categories of the 
PSB model dataset. The methods for comparison include our full 


































































































method (i.e., OUR-FULL), our method without user segmenta¬ 
tion but considering each stroke as a part (i.e., OUR-STK), Bag- 
of-Words model [Eitz et al. 2012] (i.e., BOW) and retrieval using 
global features (i.e., GF). Figure 12 compares the retrieval perfor¬ 
mances of the above methods. Our method outperforms the existing 
methods whether the segmentation information is obtained manu¬ 
ally or automatically. This is mainly because the competing meth¬ 
ods do not support localized matching, and they are matching the 
incomplete sketch to the complete model contours of the models. 



Figure 12: Retrieval performance of incomplete matching. 


6 Conclusion 

In this paper, we have investigated the use of semantic segmentation 
information to improve the performance of sketch-based 3D shape 
retrieval. We proposed the Pyramid-of-Parts to support multi¬ 
scale matching of semantic parts. With the proposed method, we 
have evaluated the retrieval performances with and without user- 
provided segmentation information. Our experimental results show 
that the proposed method performs better than the state-of-the-art 
method by [Eitz et al. 2012] in both situations. We have also com¬ 
pared the two methods with incomplete input sketches. Our ex¬ 
perimental results show that the proposed method performs signifi¬ 
cantly better than [Eitz et al. 2012]. 
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